Session 5a: Acoustic Modeling
نویسنده
چکیده
The session focused on acoustic modeling for speech recognition ; which can be segmented into three broad sub-areas: (1) feature extraction, (2) modeling the features for the speech source, and (3) estimation of the model parameters. The papers in this session touched on all of these areas. Huang focuses on the feature representation. Furui et al., Austin et al., and Kimbal et al. discuss new models for the speech source. Hon and Lee, Hwang and Huang, and Gauvain and Lee focus on parameter estimation issues. In "Minimizing Speaker Variation Effects for Speaker-Independent Speech Recognition," Huang discusses a feature representation for speech recognition that is less sensitive to the speaker. It is a cepstral mapping technique where the mapping is done with neural networks. A codeword-dependent cepstral-mapping network is estimated for each of a group of different speaker types. This cepstral mapping improves the speaker-independent performance of the CMU system. In "Recent Topics in Speech Recognition Research at NTT Laboratories," Furui et al. discuss three topics. This first topic focuses on an improved model for speech. Typical HMM recognition systems make flame-to-frame independence assumptions. Furui presented a technique aimed at minimizing this effect, using bigram-constrained HMMs, and showed an improvement when using this technique. He also discussed two issues in language modeling, one specific to Japanese, and another showing how task-independent language models can be adapted to a task at hand. out that neural networks can be combined with HMMs to automatically derive segment-level acoustic models that reduce the effect of frame-to-frame independence assumptions in standard HMMs. He shows that proper parameter estimation techniques are key for these models and presents a technique called N-best training which improves the performance of his segmental model. Kimbal focuses on the segmentation aspects of segmental models. He shows that incorporating a probabilistic segmentation model improves the performance of the Boston University speech recognition system. The following three papers discuss the area of parameter estimation. "Vocabulary and Environment Adaptation in Vocabulary-Independent Speech Recognition" by Hun and Lee revisits the area of task independence, but this time from an acoustic point of view. Traditional HMM-based speech-recognition systems work much better if their acoustic training data use the same task/vocabulary as the testing data. Hon and Lee look at techniques for making the training data more general. In particular, they examine novel techniques that improve vocabulary-independent performance by making the parameter-estimation technique focus on the testing vocabulary. …
منابع مشابه
Session 2: Language Modeling
This session presented four interesting papers on statistical language modeling aimed for improved large-vocabulary speech recognition. The basic problem in language modeling is to derive accurate underlying representations from a large amount of training data, which shares the same fundamental problem as acoustic modeling. As demonstrated in this session, many techniques used for acoustic mode...
متن کاملSession 8: Speech II
This session contains four papers that describe new techniques and recent advances in acoustic modeling. This is an extremely important area of research. Throughout the past twenty years, as computers became more powerful and speech data more abundant, new directions in acoustic modeling further advanced the state of the art. The first two papers describe novel techniques that may lead to parad...
متن کاملSession 9: Speech III
This session consisted of five papers whose contents spanned a broad range of topics in speech recognition. They dealt with problems in the basic areas of acoustic modeling, stat ist ical language modeling, and recognit ion search techniques, as well as adaptation of both the acoustic and language models to new data. All papers included experimental test results on well-known data sets and cond...
متن کاملSession: 5A
ACOUSTIC WAVES MEASUREMENTS ON SNGS CRYSTALS AND DETERMINATION OF MATERIAL CONSTANTS E. CHILLA*1, R. KUNZE2, A. SOTNIKOV2, M. WEIHNACHT2, J. BOHM3, R. B. HEIMANN4, M. HENGST4, and U. STRAUBE5, 1VI Tele Filter GmbH, Teltow, Germany, 2Leibniz Institute for Solid State and Materials Research, Dresden, Germany, 3Institute for Crystal Growth, Berlin, Germany, 4Freiberg University of Mining and Techn...
متن کاملSession 3: Continuous Speech Recognition
The papers in this session focus on techniques for and applications of large-vocabulary continuous speech recognition. The technique oriented papers discuss techniques for channel compensation, fast search, acoustic modeling, and adaptive language modeling. The applications oriented papers discuss methods for using recognizers for language identification, speaker identification, speakersex iden...
متن کامل